Search CORE

8 research outputs found

Evaluating and Improving the Coreference Capabilities of Machine Translation Models

Author: Abend Omri
Cattan Arie
Stanovsky Gabriel
Yehudai Asaf
Publication venue
Publication date: 16/02/2023
Field of study

Machine translation (MT) requires a wide range of linguistic capabilities, which current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora. In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} To answer this question, we develop an evaluation methodology that derives coreference clusters from MT output and evaluates them without requiring annotations in the target language. We further evaluate several prominent open-source and commercial MT systems, translating from English to six target languages, and compare them to state-of-the-art coreference resolvers on three challenging benchmarks. Our results show that the monolingual resolvers greatly outperform MT models. Motivated by this result, we experiment with different methods for incorporating the output of coreference resolution models in MT, showing improvement over strong baselines.Comment: EACL pape

arXiv.org e-Print Archive

From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization

Author: Bar-Haim Roy
Cattan Arie
Eden Lilach
Kantor Yoav
Publication venue
Publication date: 06/06/2023
Field of study

Key Point Analysis (KPA) has been recently proposed for deriving fine-grained insights from collections of textual comments. KPA extracts the main points in the data as a list of concise sentences or phrases, termed key points, and quantifies their prevalence. While key points are more expressive than word clouds and key phrases, making sense of a long, flat list of key points, which often express related ideas in varying levels of granularity, may still be challenging. To address this limitation of KPA, we introduce the task of organizing a given set of key points into a hierarchy, according to their specificity. Such hierarchies may be viewed as a novel type of Textual Entailment Graph. We develop ThinkP, a high quality benchmark dataset of key point hierarchies for business and product reviews, obtained by consolidating multiple annotations. We compare different methods for predicting pairwise relations between key points, and for inferring a hierarchy from these pairwise predictions. In particular, for the task of computing pairwise key point relations, we achieve significant gains over existing strong baselines by applying directional distributional similarity methods to a novel distributional representation of key points, and further boost performance via weak supervision.Comment: ACL 202

arXiv.org e-Print Archive

CDˆ2CR:Co-reference resolution across documents and domains

Author: Arie Cattan
Clare Amanda
Dagan Ido
Liakata Maria
Ravenscroft James
Publication venue
Publication date: 29/01/2021
Field of study

Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case we focus on in this paper is the resolution of entities mentioned across scientific work and newspaper articles that discuss them. Identifying the same entities and corresponding concepts in both scientific articles and news can help scientists understand how their work is represented in mainstream media. We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CD

^2

CR). The task aims to identify links between entities across heterogeneous document types. We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CD

^2

CR. Our data set, annotation tool and guidelines as well as our model for cross-document cross-domain co-reference are all supplied as open access open source resources.Comment: 9 pages, 5 figures, accepted at EACL 202

arXiv.org e-Print Archive

Aberystwyth Research Portal

LingMess: Linguistically Informed Multi Expert Scorers for Coreference Resolution

Author: Cattan Arie
Goldberg Yoav
Otmazgin Shon
Publication venue
Publication date: 25/05/2022
Field of study

While coreference resolution typically involves various linguistic challenges, recent models are based on a single pairwise scorer for all types of pairs. We present LingMess, a new coreference model that defines different categories of coreference cases and optimize multiple pairwise scorers, where each scorer learns a specific set of linguistic challenges. Our model substantially improves pairwise scores for most categories and outperforms cluster-level performance on Ontonotes. Our model is available in https://github.com/shon-otmazgin/lingmess-core

arXiv.org e-Print Archive

SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

Author: Arie Cattan Sophie Johnson, Daniel S. Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope
Publication venue: Automated Knowledge Base Construction (AKBC)
Publication date: 01/01/2021
Field of study

Ezid